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Abstract Body 


Background / Context: 

In contrast to randomized experiments, the estimation of unbiased treatment effects from 
observational data requires an analysis that eonditions on all eonfounding eovariates. 
Conditioning on eovariates ean be done via standard parametrie regression teehniques or non- 
parametrie matehing like propensity seore (PS) matehing. The regression or matehing estimators 
are eausally unbiased (or at least eonsistent) if the seleetion meehanism is strongly ignorable, i.e., 
if all eonfounding eovariates are reliably measured (Rosenbaum & Rubin, 1983). However, for 
praetitioners, the strong ignorability assumption is not very informative beeause it does not tell 
them whieh eovariates should aetually be ineluded in a regression or PS analysis. Moreover, 
researehers frequently presume that they might not eompletely sueeeed in removing seleetion 
bias beeause their data set might laek some potentially eonfounding eovariates; Or they might 
only have unreliable measures or proxies of some erueial eonstruets. Thus, in order to remove as 
mueh of the seleetion bias as possible, it is eommon adviee to eondition on all or at least a big set 
of eovariates. The general eredo is “the more eovariates, the less bias”: Conditioning on more 
eovariates eannot do harm, i.e., inerease the bias in the estimated treatment effeet. 

However, reeent studies have shown that eonditioning on eertain types of eovariates sueh as 
instrumental variables (IVs) or collider variables ean aetually amplify or induee bias (e.g., 
Bhattaeharya & Vogt, 2007; Wooldridge, 2009). Neither IVs nor eolliders are eonfounders 
beeause, with respeet to the data-generating model, they do not simultaneously determine the 
outeome (7) and the treatment (Z). IVs eausally determine the treatment but are eausally 
unrelated to the outeome exeept for its indireet relation via treatment Z (see Figure 1). Colliders 
are typieally unrelated to both the outeome and the treatment but are themselves determined by 
other variables that might affeet treatment Z or outeome Y. Interestingly, though IVs and 
eolliders are unrelated to the outeome, ineluding them in a regression or PS model may result in 
a dramatieally inereased bias — the bias might be mueh larger than the bias of the naive estimate 
(i.e., the simple mean differenee between the treatment and eontrol group, without any eovariate 
adjustments). Thus, if one would know whieh variables are instruments or eolliders they should 
not be eonditioned on when estimating the treatment effeet. Though the prevalenee of pure 
instruments and eollider variables is very likely low in praetiee, they elearly demonstrate the 
eounterintuitive faet that eonditioning on some variables might aetually inerease bias. 

Even if IVs might be rare in praetiee, we are mueh more likely eonfronted with eovariates that 
are almost like IVs, that is, eovariates that strongly determine treatment Z but are only weakly 
related to the outeome (other than via Z). Due to their resemblanee to pure IVs they are ealled 
near instrumental variables (near-IVs, Myers et al, 2011). Thus, as Figure 2 suggests, near-IVs 
are elearly eonfounders and should definitely be eonditioned on when estimating the treatment 
effeet, given the strong ignorability assumption is met. However, if the strong ignorability 
assumption is not met, near-IVs partially behave like pure IVs and amplify the remaining bias. 
This is so, beeause eonditioning on an observed eonfounder reduees overt bias but it 
simultaneously amplifies any remaining hidden bias (Pearl, 2010, 2011). Partieularly for near- 
IVs, bias amplifieation might dominate bias reduetion, in partieular if the near-IV is strongly 
related to treatment Z and if the extent of remaining bias is not negligibly small. Given that 
eonditioning on near-IVs might aetually inerease bias some authors suggest exeluding near-IVs 
from eausal analyses (e.g.. Pearl, 2011). Similar arguments ean be made for near-eolliders, but 
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this is not the focus of this paper. 

Purpose / Objective / Research Question / Focus of Study: 

In this paper we focus on the bias-amplifying effect of near-IVs — ^but not only of a single near- 
IV in the context of a simple data-generating model (e.g., Myers et ah, 2011; Pearl, 2010), but of 
multiple near-IVs in the context of more complex and realistic data-generating models. While for 
the single near-IV case with only one or two confounding covariates closed analytic formulas for 
the extent of bias amplification can be derived (Pearl, 2010), this is essentially impossible for 
most of the data-generating models that involve more than two confounders. Thus, with the 
exemption of some formal derivations, we investigate the effect of conditioning on near-IVs on 
bias reduction using simulated data that involve many (interrelated) confounders. In our 
simulations we vary the confounders’ (i) degree of being a near instrument, (ii) correlation 
structure, (iii) heterogeneity with respect their relation to treatment Z and outcome Y (i.e., the 
strength and direction of the relation), and (iv) measurement reliability. The main research 
questions can be summarized as follows: 

(A) Given strong ignorability is not met, how strong can bias-amplifying effects of 
confounders, particularly near-IVs be? 

(B) Given the bias-amplifying effect of near-IVs, should on deliberately exclude potential near- 
IVs from a causal analysis? 

Research Design: 

We based our simulation study on a data generating model with two continuous endogenous 
variables — the outcome Y and the treatment variable Z — and one hundred exogenous 
confounders U = (f/i, ..., Uk, ..., Gioo)'. Figure 3 shows the data generating model. In simulating 
the data we modeled the outcome T as a linear function of the confounder vector U and treatment 
Z, 7 = U'P + tZ + Sy , where p = (y0[, ... ,y0[oo)' is the column vector of path coefficients. The 
continuous treatment variable Z was generated according to Z = U'a + , where a = («i, ... ,«ioo)' 

is the column vector of corresponding path coefficients. For each simulated scenario, we 
changed the values of a , P , and p . By setting = .9 and = .1 , we constructed near-IVs 
because it shows a stronger relation with the treatment Z rather than the outcome Y. Outcome- 
related confounders were generated by setting a^ = .5 and = .5 . In the scenarios with 
homogeneous confounders, all confounders come from a single confounder type, i.e., 

= • • • = and = • • • = . But for heterogeneous confounder scenarios we generated two 

different confounder types which we evenly split into 50 near-IVs and 50 outcome-related 
confounders. For each simulated scenario we run 101 regressions in order to estimate the 
treatment effect. The first model was always estimated without the inclusion of any confounders. 
Then we continuously increased the number of included confounders from j=\ to 100 such that 
the estimated models are given by 7 = h,, + ijX + bfl^ + . . . + bp j . The plots in the results section 

show the average remaining bias in percent, with the x-axis representing the number of included 
confounders, from 0 to 100. (More details are given in the Appendix.) 

Findings / Results: 

Types of Covariates (Near-Instrumental Variables and Outcome-Related Confounders). 

Figure 4 shows how the two different sets of uncorrelated covariates affect bias-reduction. 
Conditioning on outcome-related confounders, i.e., confounders that are strongly related to the 
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outcome but only weakly related to the treatment, shows an almost linearly pattern of bias 
reduetion (dashed line): each confounder removes approximately the same amount of bias. For 
example, if one conditions on 80% of the confounders (i.e., 20% are assumed to be unobserved), 
almost 80% bias is removed from the initially biased treatment effeet (20% is remaining). The 
result is quite different for near-IVs (solid line in Figure 4). Conditioning on 80% of eonfounders 
removes only 40% of the bias (60% remaining). The reduced bias reduction of the near-IVs as 
eompared to the outeome -related confounders is due to bias-amplification. The difference 
between the dashed line of the outcome-related eovariate and the solid line for the near-IVs 
indieates the degree of bias amplification (as compared to the outeome -related eonfounders). 

Correlation Structure. The pattern of bias reduetion strongly changes once we allow for 
correlated eonfounders. If eonfounders are eorrelated with each other {p = .3) then the extent of 
bias reduction increases and bias-amplifieation diminishes (Figure 5). First, bias reduction 
increases beeause the confounders ineluded the analysis partially piek up the seleetion bias due 
to the unobserved but correlated confounders. Thus, conditioning on only 20% of the 
confounders leaves about 35% and 10% of the bias for the ease of near-IVs and outcome-related 
eonfounders, respectively. Bias reduetion is weaker for near-IVs than for outeome-related 
eonfounders beeause, as before, the remaining bias is amplified. However, since the remaining 
bias is less than in the ease of independent eonfounders, the extent of bias amplifieation is 
smaller for eorrelated eonfounders as the difference between the solid and dashed line indicates. 
In any ease, for uneorrelated or correlated confounders, conditioning on an additional near- IV (or 
outeome-related confounder) always reduced bias, it never inereased bias; Though the bias- 
reducing potential of near-IVs shrinks as the extent of remaining bias (i.e., violation the strong 
ignorability assumption) inereases. 

Heterogeneity of Covariates. When eonfounders are different (i.e., we now have two groups of 
eonfounders in our data-generating model: near-IVs and outeome-related eonfounders) the 
situation changes dramatically. The extent of bias reduetion now depends on which covariates 
are ineluded and which ones are omitted (i.e., unobserved) from the regression or PS model. 
Moreover, bias might even inerease. Figure 6 shows two extreme cases. In the first ease (solid 
line) the near-IVs are ineluded first and only afterwards the outcome -related confounders are 
included. As Figure 6 shows, if we condition only on near-IVs, the bias-amplifying effect of 
near-IVs inereases the bias in the treatment effect even more. Beeause bias-amplification 
dominates bias-reduction the overall bias increases. Only when outcome-related covariates are 
ineluded, bias is aetually redueed. In the seeond case (dashed lines) outcome-related covariates 
are ineluded first, near-IVs afterwards. In this case, bias reduces as one eonditions on more and 
more covariates. Near-IVs no longer eause an inereasing bias because the remaining bias (after 
eonditioning on the outeome-related eonfounders) is too small to be amplified beyond the bias- 
reducing effect on the near-IVs. These results suggest that one should not condition on near-IVs 
if there is still considerable remaining bias left. However, in practice we rarely know whieh 
variables are near-IVs and how much bias is remaining after eonditioning on some eovariates. 

Reliability of Covariates. In general, eovariates should be reliably measured because 
measurement error reduces a covariate’s potential for removing seleetion bias (e.g., Steiner, 

Cook & Shadish, 2011). However, if a covariate, like a near-instrument, has a tendeney to 
increase bias, measurement error aetually attenuates the bias-amplifying effect (but also the bias 
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reducing effect). Figure 7 shows the remaining bias for near-IVs measured with different 
reliabilities. When conditioning on a highly reliable near-IV, bias amplification is stronger, 
resulting in a more heavily biased treatment effect. However, this pattern only occurs as long as 
bias amplification dominates bias reduction. Once the remaining bias is small enough (due to the 
inclusion of most confounders) the bias amplifying effect is dominated by the bias-reducing 
effect and, thus, bias decreases; But due to measurement error bias decreases at an attenuated 
rate. Thus, whether conditioning a causal analysis on a near-IV actually increases or decreases 
bias depends not only on the extent of remaining bias but also on how reliably the near-IV has 
been measured. 

Conclusions: 

This paper demonstrates the bias-amplifying effect of near-instrumental variables under different 
conditions. Besides reducing selection bias, near-IVs also amplify remaining bias (i.e., hidden 
bias that has not been removed by the covariates already in the regression or PS model). Bias 
amplification might exceed bias reduction and thus increase the bias instead of reducing it. The 
simulation results also show that the number of covariates included in a regression or PS model 
is not an indicator about how much biased is removed. Even with 50% or more confounders 
included bias might still exceed the bias of the naive estimate. Moreover, the correlation between 
observed and unobserved confounders can considerably weaken bias-amplification. Given the 
potentially negative effects of near-IVs, in practice one should carefully think about (i) whether 
near-IVs are among the covariates on considers for including in a regression or PS model and (ii) 
whether the potential near-IVs might increase instead of reduce selection bias. If one presumes 
that a near-IV might increase selection bias one might decide not to condition on it when 
estimating a causal effect. 

However, the problem in practice is that we rarely know for sure which covariates are near-IVs 
and whether they actually would increase the bias if conditioned on. Moreover, lacking results 
from within-study comparison that compare randomized experiments to observational data, it is 
hard to say how prevalent the issue of bias-increasing near-IVs is (they definitely amplify bias 
but not necessarily increase bias). So what can we do in practice? First, the results make it very 
clear that substantive theories about the selection process and the outcome model are very 
important. They help in identifying potential near instruments and in selecting the confounding 
covariates for removing bias. If one suspects near-IVs in the observed set of confounders one can 
probe the treatment effects sensitivity to the inclusion/exclusion of these covariates in the model. 
However, frequently we do not have strong enough theories in order to make a good call about 
which variables might be near-IVs and how much bias we might have removed by the covariates 
included in the model. Given the lack of methodological evidence so far, we only can speculate: 
We believe that the bias-amplifying effect of near instruments is usually rather weak (i.e., does 
not increase the bias) given that one has a rich set of conditioning covariates, i.e., a large number 
of reliably measured covariates including pretest measures of the outcome. Thus, we are 
cautiously inclined to suggest that near-IVs should be included in a regression or PS model when 
estimating causal treatment effects — unless substantive theory suggests the opposite. In any case, 
as with randomized experiments, one should never rely on the results of a single observational 
data set. Replication matters! 
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Appendix B, Simulation Design 


We based our simulation study on a data generating model with two eontinuous endogenous 
variables — the outeome Y and the treatment variable Z — and one hundred exogenous 
confounders U = {U\, Uk, t/ioo)'. Here we ehose 100 eonfounders sinee it allows us to 
eonveniently interpret results for the number of ineluded eovariates in terms of the pereentage 
(%), whieh direetly allows to generalize to data sets with less or more than 100 eovariates. Figure 
3 shows the eausal diagram of the data generating model. In simulating the data we modeled the 
outeome 7 as a linear funetion of the eonfounder veetor U and treatment Z, 7 = U'p + rZ + ^ y , 
where p = (y0i,...,y0ioo)' is the eolumn veetor of path eoeffieients, r is the treatment effeet , whieh 
we assume to be a structural constant of .3, and Sy is the normally distributed error term. 
Similarly, the continuous treatment variable Z, which might indicate the exact dosage of a 
treatment (e.g., of a drug), linearly depends on confounders U, Z = U'a + , where 

a = («j, ... ,«ioo)' is the column vector of corresponding path coefficients, and is the normally 
distributed error term. The confounders U were generated from a multivariate normal 
distribution, U~ A(0,7) , where I is a 100 x 100 variance-covariance matrix satisfying 
Far(t/j) = ••• = Far (t/joo) and Var{U^ + ••• + t/jog) = 1 , and has covariances equal to p . 

In each simulation scenario, we changed the values of a , P , and p . By setting = .9 and 
Pf. = . 1 , we constructed near-IVs because it shows a stronger relation with the treatment Z rather 
than the outcome 7. Likewise, outcome-related confounders were generated by setting = .5 
and = .5 . In the scenarios with homogeneous confounders, all confounders come from a 
single eonfounder type, i.e., = ••• = and Pi = ••• = Piqq. But, for heterogeneous eonfounder 

scenario we generated two different eonfounder types which we evenly split in 50 near-IVs and 
50 outcome-related confounders. Coefficients only varied across the two eonfounder types, that 
is, ^ ay or p^^ ^ Py and were identical within the given eonfounder type, i.e., 

(a I = . . . = « 5 o ) ^ (« 5 i = • • • = «ioo ) (p^ = ... = P^q)^ = . . . = p ^^^ ) . Also, to specify situations 

where the confonders U are independent or dependent of each other, we set p = 0 and p = .3, 
respectively. Finally, in order to investigate the effect of the reliability of confounders, we added 
measurement errors to the confounders U. Thus, fallibly measured confounders are generated 
according to U* = U + , where are independently normally distributed errors that have zero 

means and standard deviations such that reliabilities are 1.0, .9, .7, and .5. 

In each scenario, we simulated one thousand data sets of U, Z, and 7. For each data set, we 
regressed 7on Z and a subset of U. In the first regression model we do not account for any 
confounders such that the estimated model is 7 = hg + TqX , with f g representing the estimated 
treatment effect (in this case the initially biased treatment effect). Next we estimated the model 
with one eonfounder included, then with two, and so on. More generally, with j eovariates in the 

model, the estimated model is given by 7 = hg + fX + hjf/j + . . . + bp j , fory = 1 , . . . , 1 00. We 

repeated this procedure basically 1,000 times {k= 1, ..., 1000), averaged the estimated treatment 
effects Ty.j across iterations and computed the remaining bias as follows: 
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The plots showing the results of the average remaining bias in 


pereent, with the x-axis representing the number of ineluded eonfounders, from 0 to 100. 
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Appendix B, Tables and Figures 


Figure 1. Causal diagram illustrating an instrumental variable. 
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Figure 2. Causal diagram illustrating a near-instrumental variable. 
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Figure 4. Remaining bias after eonditioning on independent homogenous eonfounders. 
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Figure 5. Remaining bias after eonditioning on correlated homogenous confounders. 
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Figure 6. Remaining bias after eonditioning on independent heterogeneous confounders. 
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Figure 7. Remaining bias after eonditioning on unreliably measured near-IVs and outeome- 
related eonfounders. All eonfounders are independent. 
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